Gaussian Process Cosmography 
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Gaussian processes provide a method for extracting cosmological information from observations 
without assuming a cosmological model. We carry out cosmography - mapping the time evolution 
of the cosmic expansion - in a model-independent manner using kinematic variables and a geometric 
probe of cosmology. Using the state of the art supernova distance data from the Union2.1 compila- 
tion, we constrain, without any assumptions about dark energy parametrization or matter density, 
the Hubble parameter and deceleration parameter as a function of redshift. Extraction of these 
relations is tested successfully against models with features on various coherence scales, subject to 
certain statistical cautions. 



I. INTRODUCTION 

Cosmic acceleration is a fundamental mystery of great 
interest and importance to understanding cosmology, 
gravitation, and high energy physics. The cosmic ex- 
pansion rate is slowed down by gravitationally attractive 
matter and sped up by some other, unknown contribu- 
tion to the dynamical equations. While great effort is 
being put into identifying the source of this extra dark 
energy contribution, the overall expansion behavior also 
holds important clues to origin, evolution, and present 
state of our universe. 

Indeed, by studying the expansion as a whole one 
sidesteps the issue of exactly how to divide the gravi- 
tationally attractive (e.g. the imperfectly known matter 
density) and the accelerating contributions, and whether 
they are independent or have some interaction (see, e.g., 
[1-3]). By concentrating on the kinematic variables - the 
expansion properties as a function of redshift z or scale 
factor a = 1/(1 + z) - one does not need to know the in- 
ternal structure of the field equations, i.e. the dynamics. 
The clarity and focus on kinematics trades off against the 
loss of information on the specific dynamics. 

Another gain comes from using geometric measure- 
ments - cosmic-distance variables that do not depend on 
the particular forces and mass densities. The sensitiv- 
ity of Type la supernova measurements of the distance- 
redshift relation to the deceleration parameter was used 
to discover the accelerated expansion [4, 5] . Other probes 
such as gravitational lensing, galaxy-clustering statis- 
tics, cluster-mass abundances, etc. provide valuable in- 
formation, but are dependent on non-kinematic variables. 
Some techniques, such as distances from baryon acoustic 
oscillations and Sunyaev-Zel'dovich effects in clusters, are 
on the fence, nominally geometric but having implicit de- 
pendence on the gravitational interaction of matter and 
so the force law dynamics. 

Given distance and redshift measurements, the cosmic 
expansion rate is related by a derivative of the data, and 
the deceleration parameter by a further derivative. This 
is problematic for data with real world noise, as the dif- 
ferentiation further amplifies the noise. Various smooth- 



ing procedures have been suggested, e.g. [6], but tend to 
induce bias in the function reconstruction due to para- 
metric restriction of the behavior or to have poor error 
control. Using a general orthonormal basis or principal 
component analysis is another approach, to describe the 
distance- redshift relation (e.g. [7]) or the deceleration pa- 
rameter [8] , or using a correlated prior for smoothness on 
the dark energy equation of state [9], but in practice a 
finite (and small) number of modes is significant beyond 
the prior, essentially reducing to a parametric approach. 
Gaussian processes [10] offer an interesting possibility for 
improving this situation. 

Gaussian processes (GP) have been used recently [11- 
13] in a dynamical reconstruction, going from a set of 
realizations of the equation of state parameter w{z) of 
the dark energy component forward to comparison of the 
derived distance-relation to the distance data. The com- 
parison was carried out through a Markov Chain Monte 
Carlo (MCMC) assessment of likelihoods. Note that the 
GP interpolation does not occur between data points but 
rather on an arbitrary grid of some possibly unmeasured 
quantity. This approach is intriguing, but relies on sep- 
aration of the matter density from the dark energy be- 
havior, i.e. it works within a dynamical framework. 

The approach in this paper takes a fundamentally dif- 
ferent path. We begin with the observations of supernova 
distances and here consider only kinematic quantities. 
Modeling the cosmic distance relation as a smooth kine- 
matic function drawn from a GP, the value of the function 
at any redshift is then predicted directly through testing 
the GP model against the data. The cosmic expansion 
can then be extracted from the means and covariance ma- 
trices of the Gaussian process realizations (weighted by 
a posterior) directly for quantities related linearly to the 
original GP, even through derivatives. This allows us to 
probe the rate and acceleration of the cosmic expansion 
in a highly model-independent manner (at the price of 
focusing on only this type of information) . One can view 
this as a top-down approach, complementary with the 
bottom-up approach of starting with theoretical quanti- 
ties and working toward the data, and then applying a 
likelihood comparison. 
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In Sec. II we lay out the basics of the kinematic cos- 
mology quantities and the Gaussian process formalism. 
Readers familiar with the method or eager for results 
could go to Sec. Ill where we analyze the results of per- 
forming the GP reconstruction, for both current and sim- 
ulated data. We summarize the cosmological implica- 
tions and discuss the prospects in Sec. IV. Appendices 
present details of tests of the robustness of the statistical 
techniques. 



II. COSMOGRAPHIC RECONSTRUCTION 

A. Expansion History 

Homogeneity and isotropy determine the metric of the 
universe to be of the Robertson- Walker form, which for 
a spatially flat universe (from a theory or inflationary 
prior) is 



ds 2 



- dt 2 + a 2 (t) [dr 2 + r 2 (d0 2 + sin 2 6 dej) 2 )] , ( 1 ) 



where t is a time coordinate and r the coordinate dis- 
tance. The key quantity is the cosmic expansion or scale 
factor a(t), or equivalently the redshift z = a^ 1 — 1. 

Without using any field equations, such as the Fried- 
mann equations (the Einstein equations specialized to the 
above metric) - and hence in a purely kinematic way - 
we can still define a conformal distance 



n 



dt 



dr . 



(2) 



and build a luminosity distance dx,(a) = a 77(a), and an 
angular diameter distance d a {a) = arj if desired. 

Flux and redshift measurements of a set of standard- 
ized candles such as Type la supernovae deliver observa- 
tional access to rj{z). From this function directly comes 
the inverse Hubble parameter 



and the deceleration parameter 



dr} 
dz 



aa 

i{z) = -— 
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dz 



- 1 



(3) 



(4) 



This a top-down approach, starting with observable dis- 
tances and proceeding to cosmological kinematic quanti- 
ties. 

We follow this top-down approach because of its use- 
ful properties of direct relation to kinematics, avoidance 
of reliance on a cosmological model or knowledge of the 
matter density, and well defined and efficient error prop- 
agation within the GP method. 

An alternative approach with different characteristics 
is bottom-up. There, one would either parametrize q(a) 
(or perhaps a dark energy equation of state, or pressure 
to density, ratio Wd e (a) = [2q(a) — l]/[3f2d e (a)], which 



involves the dimensionlcss dark energy density fide) or 
choose realizations of q(a) from a statistical distribution. 
Parametrizing q(a) allows straightforward error propa- 
gation up to the distances, for comparison to the data; 
however one must ensure that the parametrization does 
not restrict or bias the results. Note that choosing a form 
q(a) is an explicitly dynamical assumption, breaking the 
kinematic nature of the analysis [14]. In terms of the 
equation of state, the WQ-w a form Wd e {a) = wo+w a (l — a) 
is highly robust, reconstructing d^a) to better than 0.1% 
for a wide array of models [15] but does require a sepa- 
ration into matter density and dark energy behavior. 

If one uses statistical realizations of q(z), then the error 
propagation necessary, including the covariances between 
values i at different redshifts, is 



Cov[Hi,Hj 



Cov[di, dj 



HiHj x 
dz 1 



dz" 



Cov[qi,qj] 



1 + z' Jo l + z'- 

(1 + Zi)(l + Zj)x 

dz' dz" 



(5) 
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This can be slow numerically, especially in a MCMC like- 
lihood evaluation. 

However, for a GP the relation between the covariance 
of a quantity and its derivative (as we use in the top- 
down approach) is particularly simple and furthermore 
one can avoid functional parametrizations or statistical 
distributions of the cosmological variables. 



B. Gaussian Process Modeling 

We begin with the assumption that the stochastic 
data is described by a Gaussian process that corre- 
sponds to the cosmological function r](z). The effec- 
tive supernova magnitudes at peak brightness, m, and 
their associated covariance are derived from light-curve 
data [e.g. 16]. Those peak magnitudes transformed by 
(1 + z)10 m / 5 represent measurements of the conformal 
distance with a nuisance normalization factor, y{z) = 
10 A/ /5 (10pc) _1 7]{z) where M is the absolute supernova 
magnitude. 

Derivatives of Gaussian processes are themselves Gaus- 
sian processes (with some ignorable pathological excep- 
tions). This means that the estimator for the Hubble 
length H^ 1 (a) is also a GP (this does not hold for non- 
flat universes). The deceleration parameter is not a GP 
because of its nonlinear relation to H^ 1 but its mean 
value and covariance can estimated analytically from the 
two GP functions, dy/dz and d 2 y/dz 2 , that it depends 
on. That is, 



v-ii \ d y 

H (a) oc — 

dz 



q(a) 



dry 
dz 2 



- 1 



(6) 
(7) 
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1. Gaussian Process of the Kinematic Function 

The Gaussian processes serve as a regression tool to 
infer directly from distance data the kinematic expansion 
properties as a function of rcdshift z. This provides the 
covariances between the values at different rcdshifts as 
well, which one would expect a physical function to have. 

A Gaussian process is defined as a collection of random 
variables, any finite number of which have a joint Gaus- 
sian distribution [10]. A GP f(z) is specified by a mean 
function m(z) and a covariance function or kernel k(z, z'). 
For a finite set Z of z's, values of the function are drawn 
from a Normal distribution, f ~ Af (m(Z), K(Z, Z)) 
where the matrix element Kij = k(Zi,Zj). 

The mean function m(z) is an initial guess for the func- 
tion, in effect "pre-whitening" the data to reduce the 
dynamic range over which the variations need to be fit. 
Without sufficient care the results can in fact be influ- 
enced by the mean function, so for example assuming 
a ACDM concordance relation is not necessarily a good 
choice. Wc discuss the issues in Appendix A where we 
compare several approaches to choosing a mean function 
and investigate their influence. In the main text we adopt 
an iterated smoothings set for the mean function and ver- 
ify that the final results are not influenced by this input. 

For the covariance function we use a common form, 
the squared exponential [27], 



k(z, z') = o 2 j exp 



»'|2 



2l 2 



(8) 



where a f defines the overall amplitude of the correlation 
(one can think of this as an offset or tilt of the recon- 
structed function from the input mean function), and I 
gives a measure of the coherence length of the correlation. 
These effects are discussed and illustrated in Appendix B. 

Any parameters for the mean function (such as fiducial 
fl m and w, which we do not use), and a 2 and I, are 
hypcrparamcters in the fit. 



2. Gaussian Process of the Data 

In addition to the regression variation represented by 
the GP covariance function, in the data there is intrinsic 
dispersion in the distance indicator and (possibly corre- 
lated) measurement noise. The sum of all these gives the 
GP of the measured data y, with covariance function 

k y {z l7 Zj) = k(zi, Zj) + erf Sij + N(Hi,/ij) , (9) 



where af is the intrinsic dispersion and N(/j,i,iJ,j) is the 
measurement noise covariance matrix. 

Note that because q(z) involves a ratio of distance 
derivatives, it is immune to the absolute amplitude of 
the distance, i.e. Hq or its combination with the abso- 
lute supernova magnitude A4. In using the smoothing 
method to generate the mean functions for the GP (see 
Appendix A) we fit out the absolute amplitude, making 
the kinematics independent of these nuisance parameters. 
Fitting for M is a key step that should not be neglected, 
and its uncertainties must be propagated into the final 
reconstruction uncertainties. Fixing it to a particular 
value can also bias the results. The smoothing method 
has been shown robust for including A4 in [25], and a 
similar approach has been used in [17] in reconstruction 
of the expansion history of the universe by combining a 
smoothing method and Crossing Statistic [18, 19]. 



3. Inferring Kinematic Functions from Data 

Given data y measured at a set of points Z we want a 
faithful reconstruction of the distance 77, as well as its 
derivatives, at some other set of points Z\. Call the 
reconstructed function f . In GP, the joint probability 
distribution is given by 



m(Z) 
m(Zi) 



K y (Z,Z) K(Z,Z X ) 
K(Z 1 ,Z) K(Z 1 ,Z 1 ) 



(10) 

Here the subscript y is just to clearly indicate the GP of 
the input data. The conditional distribution of f given 
the data is described by 

f = m(Zi) + K{Zx,Z)K~\Z, Z) y (11) 

Cov (f) = K{Z U Z X ) - K{Z X ,Z)K- X (Z, Z)K(Z, Z x ) . 

(12) 



The probability distribution functions (PDFs) of the 
reconstructed functions (see Eq. Al for details) are inte- 
grated over the hyperparameter space, weighted by the 
hypcrparamctcr posterior distribution. For each point in 
hyperparameter space the PDF of the GP function and 
its derivatives (e.g. the distance, Hubble length, and sec- 
ond derivative entering the deceleration parameter) can 
be written analytically: 



" y " 
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r m(Z) - 




f 

f 






m(Zi) 
m'(Z!) 




f" 
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(13) 
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where 



are given by 



dzfdz? 



(14) 



and a prime indicates d/dz. 

The inferred mean and covariance of the derivatives 
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(15) 
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r 
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(161 



For q(z), which is not a GP, the mean is given by Eq. (7) 
and its variance is 



Var[ g (z)] = (q+1 



Var [*,'(*)] 



y' 2 {z) 

Var[y"(z)] 

y" 2 (z) 



, Cov[y'(z),y"(z)] 

' i/W(*) 



(17) 

To integrate over the hyperparamctcr space (with its 
non-Gaussian posterior) we can cither perform a Monte 
Carlo integration or do grid sampling. Since we only have 
two hyperparameters, er 2 - and I, we use grid sampling, 
equispaced in logarithm with priors oflO -5 < a ? < 1 
and 1(T 2 < I < 10 2 = 1.6. The final reconstructed 
results are weighted averages from the posterior based 
on results from all points in the sampled hyperparameter 
space. See Appendix B for further details. 



that are useful for testing the GP reconstruction. The 
last model is a rapidly evolving dark energy cosmology 
with a sharp transition between a high redshift value of 
the dark energy equation of state parameter w = —0.5 
and a low redshift value w = —1. The specific w(z) 
is given by the CCL or "kink" form [21] with the same 
parameters used in [12]. This is a much more rapid tran- 
sition, and at a lower redshift, than expected in general 
from dark energy, and so also poses a challenging test of 
reconstruction. 



Cosmology 


Description 


ACDM 




= 0.27, w = 


-1 


Mirage 




= 0.27, wo = 


-0.7, w a = -1.09 


Kink 




= 0.27, wo = 


-1, w(z > 0.5) = -0.5 



TABLE I: Input cosmologies used to generate simulated dis- 
tance data from which the GP tries to reconstruct the appro- 
priate kinematic cosmological quantities. 



C. Data and Simulations 

To test the robustness of the reconstruction we perform 
cosmographic fits to simulated data, and then we also 
perform fits to actual current data. For current data we 
use the Union2.1 supernova compilation, including full 
error covariance matrix, consisting of 580 distances from 
z = 0.02 — 1.4. The simulated data consists of the same 
number over the same range, realizing distances using a 
random intrinsic dispersion of 6% in distance, for various 
input cosmologies. 

These different cosmologies are intended both to test 
the robustness of the GP reconstruction and to explore 
the discriminatory power of the reconstruction. They 
are summarized in Table I and all have dimensionlcss 
present matter density Q m = 0.27. One is a ACDM cos- 
mology. Another is a member of the family of mirage 
models [20] , which match the distance to CMB last scat- 
tering of ACDM, using the relation for the dark energy 
equation of state w a = —3.63(1 + Wq). In the limit that 
Wo = — 1 this family reduces to ACDM. Note their phan- 
tom crossing of w = —1 can give unusual features in q{z) 



III. EXPANSION HISTORY RESULTS 

From many realizations of the GP reconstruction we 
reconstruct the kinematic functions rj, H^ 1 = r/', and 
q(if , 77"), where prime denotes d/dz, and their PDFs. We 
show error bands at every redshift defined such that 68% 
of the realizations lie within this range. We emphasize 
that the error band should be interpreted in a redshift by 
redshift sense and the covariances are not visible in such 
a plot; that is, the upper part of the band at one redshift 
may be correlated with the lower part of the band at 
another redshift. 



A. Simulated Data 

For each of the cosmologies in Table I we plot h^ 1 oc 
H^ 1 (z) in Fig. 1 and q(z) in Fig. 2. The true relations are 
given for each cosmology by the long, short, and medium 
dashed curves (the same in all panels of a set). The GP 
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reconstructions arc shown using simulated data based on 
each cosmology in turn, with 68% error bands. If the er- 
ror bands fail to overlap the true relation, the GP recon- 
struction would be inaccurate at 68% confidence level; if 
the error bands fail to overlap the alternate cosmologies' 
true relations, GP is successful at distinguishing these 
models. (Note that these are conservative criteria since 
even if error band overlaps a true relation this does not 
necessarily mean agreement with that cosmology because 
the redshift correlations are not visible.) 

We see that each input cosmology is faithfully recon- 
structed, and each alternate cosmology is properly ex- 
cluded (at 68% confidence or greater). These results 
demonstrate that GP can be a useful statistical tool for 
model independent kinematic parameter estimation. 

Agreement of the error band with the input model is 
a necessary but not wholly sufficient condition for ac- 
curate reconstruction. One needs to take into account 
the correlations between the predictions at each redshift. 
Rather than do a model by model, full likelihood com- 
putation, we tested the influence of correlations between 
rcdshifts through model independent, simple statistics. 
The first used the Om function [22] of the Hubble pa- 
rameter that serves as a straightforward consistency test 
of ACDM, and the second example used the decelera- 
tion parameter. Looking at the distribution of the dif- 
ferences AOto(0.2,0.9) = Om(z = 0.2) - Om(z = 0.9) 
and Ag(0.2, 0.9), for example, we find agreement with 
the error band results that the GP reconstructions accu- 
rately reproduce the input cosmology values. Since our 
analysis assumes no dynamics, we do not have to split 
components into matter and dark energy, and so our re- 
sults would apply to data generated with different input 
fl m (as we have tested) and even cases with coupling 
between them. 



B. Current Data 

We now apply the model-independent constraints from 
GP to the expansion history reconstructed from actual 
current data. The Union2.1 compilation [23] carried out 
a homogeneous, blind, systematics-oriented analysis of 
supernova distance data. We use their full data covari- 
ance matrix for the statistical plus systcmatics uncertain- 
ties. The kinematic reconstruction results are presented 
in Figs. 3 and 4. 

The ACDM model with f2 m = 0.27 (now not an input 
for the mock data, but a comparison to the fit) is found 
to be in strong concordance with the Union2.1 data. 

However, we cannot distinguish ACDM from the mi- 
rage family of models, even one with as extreme time 
variation as wo = —0.7, w a = —1.09. Partly this 
is due to the best fit from current data lying between 
the two, roughly corresponding to a mirage model with 
wo = —0.85, w a = —0.54, and partly due to using the 
full covariance matrix with systcmatics for current data, 
which gives larger error bars than the simulated statisti- 
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FIG. 1: GP reconstructions of the inverse Hubble parame- 
ter h~ 1 (z) oc H~ 1 (z) are given for simulated data based on 
the three different input cosmologies of Table I. The dashed 
curves, the same in all panels, give the true relations. The 
error band on each reconstruction represents the 68% confi- 
dence level. The reconstruction in each case faithfully agrees 
with the input cosmology. 
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FIG. 2: As Fig. 1, but for the deceleration parameter q(z). 



cal errors used in the previous plots. 

Current data docs point unambiguously to current ac- 
celeration in this model independent reconstruction, with 
q < at low redshift with strong significance. However, 
current kinematic data does not indicate when dark en- 
ergy fades into the past, i.e. q > is not required at z > 1 
from this data. 



FIG. 3: GP reconstruction of the inverse Hubble parame- 
ter oc using the Union2.1 data compilation is 
given by the shaded error band representing the 68% confi- 
dence level. 
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FIG. 4: As Fig. 3, but for the deceleration parameter q(z). 



IV. CONCLUSIONS 

Gaussian processes can be successfully used in a sub- 
stantially nonparametric reconstruction of the kinematic 
quantities characterizing the cosmic expansion. We sim- 
ulated several cosmologies and found that GP chose the 
correct one each time. GP also has the advantage of sim- 
ple, well controlled propagation of errors and covariances 
to derivatives (or integrals) of the function, allowing dis- 
tance relations to be converted to the Hubble length or 
second derivative (which can then be analytically propa- 
gated to the deceleration parameter). 

Key ingredients entering GP are the mean function and 
covariance function, with their hyperparameters. We em- 
phasized caution in adopting such functions that might 
impose (hidden) restrictions on the reconstruction, e.g. 
through the coherence length, possibly leading to results 
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retaining memory of the starting point. Proper treat- 
ment of hyperparameters through weighted integration 
over their space and sufficiently wide priors is essential. 
The combination of smoothing based on the data itself, 
iteration to remove initial conditions, and a set of mean 
functions to enable diversity in GP amplitudes and corre- 
lations seems to deliver robust results based on our tests. 
Future work aims at refining this approach further, in 
particular studying the stability of results for a broad 
range of input cosmologies. 

Our reconstructions of H~ 1 (z) and q(z) using current 
data are consistent with ACDM, but also with cosmolo- 
gies with substantial time variation. In particular, this 
holds for the mirage class of models, which preserve the 
distance to CMB last scattering and so the addition of 
CMB data to the supernova data will not affect this con- 
clusion. Dynamical probes, such as growth, in combi- 
nation with the kinematic distance measurements used 
can have further leverage, but mirage models also have 
substantially similar growth to ACDM; for example our 
extreme mirage model agrees in growth as a function of 
redshift to within 1.5% with ACDM with the same mat- 
ter density. 

While mirage models cross w = — 1, no conclusions 
can be drawn from the data regarding the necessity for 
such crossing to occur. Furthermore, until future data 
accuracy constrains the time variation of the equation 
of state to below w a < 0.5 (the current best fit mirage 
value), crossing cannot be said to be tested significantly. 

Current distance data has insufficient leverage on the 
higher order kinematic quantities, such as the deceler- 
ation parameter q(z). It docs definitely show, by this 
substantially nonparametric approach, that cosmic ac- 
celeration is occurring at low redshift. The transition 
from acceleration to deceleration, however, could have 
happened at any redshift z > 0.7, or even not at all, 
according to this current data. Future kinematic data 
extending to redshifts z > 1 are necessary to resolve all 
the issues of substantial time variation, phantom cross- 
ing, and the onset of acceleration. Future applications 
of GP include projection of such constraints from future 
surveys, possibly allowing for spatial curvature, and tests 
of modified gravity, say, through growth vs. expansion. 
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Appendix A: GP Mean Function 

The GP formalism gives a clear formulation for deriva- 
tion of the kinematic functions and their derivatives using 
the data, however there are some practical technical de- 
tails that require care. One of the important issues is the 
initial guess for the mean function. The final results turn 
out not to be independent of this for an arbitrary choice, 
but can in fact retain some memory of the initial choice, 
biasing the results. 

One reason for this is because of the multiscale na- 
ture of the data for an arbitrary cosmology. With two 
square-exponcntial-kcrncl hyperparameters, one for co- 
herence length and one for the amplitude of deviations 
from the mean function, there exists limited freedom for 
the GP to track the deviations of the data from the in- 
put mean function redshift by redshift. One solution is 
to use more hyperparameters for the GP covariance func- 
tion but this leads to a greater computational burden and 
the possibility of fitting the noise in the data rather than 
the cosmological signal. 

The residuals of the data around different mean func- 
tions can be quite different. Certainly there is little ex- 
pectation that any of these residuals are perfectly de- 
scribed by a Gaussian Process with a particular kernel. 
It is therefore not unexpected for model predictions from 
different mean functions to be statistically inconsistent. 

To give another view on this, consider the GP likeli- 
hood probability in more detail. It contains three parts: 
the usual x 2 , the determinant of the GP likelihood func- 
tion penalizing overcomplcxity, and a constant contribu- 
tion involving the number of data points [10], 

2\np{y\f) = -y T £ 00 (Z,Z)-Vlndct£oo(Z,Z)-nhn>), 

(Al) 

GP tries to find the best combination of first and second 
parts to get the highest likelihood. That is, it tries to 
make / as close as possible to the data y by making 
minimum changes to the given mean function to get a 
reasonable \ 2 an d at the same time tries to keep the 
results smooth enough to get a high likelihood from the 
second term. Note the second term is independent of the 
data and arises solely from the hyperparameters. (This 
simplified explanation is somewhat complicated by the 
weighted integration over the hyperparametcr space, but 
the basic flavor of it holds.) 

The ultimate model-independent input mean function 
is the zero mean function. Here, however, we need a 
large aj to bring / close to y; to apply this "correction" 
to zero input over a large redshift range, without merely 
being a constant offset, requires a large coherence length 
I. While one might then succeed in fitting / to y, this 
comes at the price of smoothing away the features and 
losing accurate reconstruction of y' and y" . Conversely, 
if we choose an input mean function that gives / close to 
y at several redshifts, then GP wants to keep af small to 
change the input function as little as possible. A small 
(7/ effectively makes GP moot and so the reconstruction 



8 



never strays far from the input function, with the results 
retaining memory of the input. 

We have verified these properties by investigating a 
large variety of mean functions: 1) zero mean function, 2) 
flat ACDM model with a fixed Q m , 3) flat ACDM model 
with Vl rn as an added hyperparameter, 4) flat wCDM 
model with Q m and w as added hyperparameters. The 
issues raised above show up clearly. 

To get around these problems we want to use an input 
mean function with little cosmology dependence (for ro- 
bust results), sensitivity to multiple scales (for flexibility 
in fitting y well enough that the derivatives are recon- 
structed well), and few added hyperparameters (for com- 
putational tractability) . The solution we have adopted 
after extensive testing is an iterated smoothing approach, 
building on the method developed in [24-26]. 

We emphasize that the smoothing is only to generate 
an initial mean function. The data is smoothed over a 
scale A in ln(l + z), after "pre- whitening" using an initial 
guess d 9 L [26]: 



\nd L (z, A) s - lnd L (z) 9 = N(z) x 



(A2) 
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This procedure is then iterated, with the output of 
one iteration serving as the initial guess of the succes- 
sive iteration. The final reconstructed results have been 
shown to be independent of the first initial guess [24- 
26]. Because the final iteration is a smooth function, 
we can take the derivatives of the mean functions as re- 
quired for computing /' and /". To incorporate many 
scales we actually use a set of 5 initial guess mean func- 
tions, iterating each one independently The scatter in 
the final mean functions or the appropriate derivatives 
then is added, weighted by likelihood, as an additional 
uncertainty, in the statistical sense of the mean squared 
error known as risk: the quadratic sum of the mean dis- 
persion and the dispersion in the mean. We find that 
stopping the iteration procedure when the % 2 of each lies 
within A\ 2 = 2.3 of each other (equivalent to la for 2 
degrees of freedom, hence as much dispersion as from I 
and aj) gives robust results, as seen from the accurate re- 
constructions achieved for the ACDM, mirage, and kink 
simulations. In order to capture the true data it is im- 
portant to have inputs that can cross the true cosmol- 
ogy over different redshift ranges, so the 5 initial guesses 
cover a wide range of behaviors, ACDM cosmologies with 
(n mj n A ) = (1,0), (0,1), (0,0), (0.3, 0.7), (0.5, 0.5). Note 
that no additional hyperparameters are introduced. 



This iterated smoothing set approach to the mean 
function has the desired properties of not relying on 
a specific cosmological form and having freedom from 
memory of the initial guess, while allowing the CP for- 
malism to balance the different terms in the likelihood 
and give accurate reconstructions with well characterized 
errors. The tests run for reconstruction of the different 
cosmologies as shown in Figs. 1 and 2 demonstrate its 
success. 



Appendix B: Hyperparameter Distribution 

The hyperparameters of the covariance function are 
another ingredient for the GP reconstruction. Each set 
of hyperparameters, in our case aj and I, gives rise to 
a particular likelihood by Eq. (Al), for each of the five 
mean functions. 

The posterior distribution is derived using Bayes' the- 
orem 



P(i,a},l) = 



L(i,aj,l)p(i)p(aj)p(l) 



Z)»=i / L(i,aj, 0p(*)p( cr /)p(0 < ^ ln Vfdhil' 

(Bl) 

where i is the index for the five mean functions, p(i) = 
1/5, and the other priors are flat in the logarithm for 
10~ 5 < a) < 1 and 10~ 2 < I < 1.6. The lower limit on 
a'j and upper limit of I do impose non-trivial truncation 
of the likelihood surface as discussed below. Note that 
setting a minimum I = 10 -2 is equivalent to imposing a 
blurring on the square-exponential kernel, and prevents 
fitting to merely noise in the data. Realizations of /, 
/', and /" are drawn from the Gaussian Process models 
represented by this posterior. 

Figure 5 illustrates the role of aj and / in the recon- 
struction. Basically 07 acts to set the amplitude for de- 
viations from the mean function and I controls the wig- 
gliness, or coherence scale. 

As discussed in Appendix A, a small value of aj repre- 
sents little contribution of GP to the reconstruction pro- 
cess, i.e. the result is basically just the mean function. 
Large values of I smooth over features in the data and 
basically give merely an offset that could be absorbed 
in the amplitude. If we included arbitrarily small aj or 
large I in the hyperparameter ranges, these regions of 
the space would give nearly identical likelihoods and di- 
lute the overall probabilities, biasing the results toward 
the mean function. To avoid this situation we impose 
the lower limit logo -2 > —5 (i.e. ignoring models chang- 
ing the mean function by less than 0.7%) and the upper 
limit / < 1.6 (i.e. the range of the data). We have checked 
that the final best fits for the hyperparameters are not 
significantly affected by small variations in the priors. 
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FIG. 5: Schematic plot of the effects of oj and I on the re- 
construction function f(z). The solid red curve has I = 0.1 
and cr? = 0.001 and will serve as a reference. The two dark, 
blue dotted lines show the impact of changing I, with I = 0.01 
(wiggly line) and Z = 1.0 (nearly smooth line), keeping aj 
unchanged. The two light, green dashed lines show the im- 
pact of changing 07, with a'j =0.1 (increased amplitude of 
deviations in f(z)) and a 2 f = 0.00001 (decreased amplitude), 
keeping I at the reference value. For very small values of cr?, 
GP makes very little contribution (near zero modification at 
all scales), while for high values of I possible features of the 
data might be smoothed out. 
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